Skip to content

Add KimiK2Detector with tool interruption support#19696

Merged
ispobock merged 4 commits into
sgl-project:mainfrom
JustinTong0323:fix/kimi-k25-implicit-tool-end-token-end-thinking
Mar 3, 2026
Merged

Add KimiK2Detector with tool interruption support#19696
ispobock merged 4 commits into
sgl-project:mainfrom
JustinTong0323:fix/kimi-k25-implicit-tool-end-token-end-thinking

Conversation

@JustinTong0323
Copy link
Copy Markdown
Collaborator

@JustinTong0323 JustinTong0323 commented Mar 2, 2026

Summary

...
sglang/srt/layers/quantization/compressed_tensors/compressed_tensors.py", line 680, in get_moe_scheme
    return CompressedTensorsMxInt4MoE(self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Can't instantiate abstract class CompressedTensorsMxInt4MoE without an implementation for abstract method 'apply_weights'

Test plan

  • Added unit tests for KimiK2Detector initialization, non-streaming parsing with tool interruption, streaming parsing with tool interruption, and post-interruption normal text handling.
  • Added integration tests through ReasoningParser API for both streaming and non-streaming Kimi K2 tool interruption scenarios.
  • python -m pytest test/registered/parser/test_reasoning_parser.py passes.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the handling of Kimi K2 models by introducing a dedicated KimiK2Detector. This new detector improves the parsing of model outputs, particularly by enabling more flexible and efficient tool call interruptions. It streamlines the transition from reasoning to tool-calling sections, enhancing the model's ability to interact with external tools. Additionally, it simplifies the server's model-specific adjustments by removing outdated or redundant logic for Kimi K2/K2.5 quantization and MoE backend selection.

Highlights

  • New Detector for Kimi K2 Models: Introduced KimiK2Detector to specifically handle Kimi K2 models' reasoning format, including implicit tool interruption.
  • Enhanced Tool Interruption Handling: The new detector recognizes <|tool_calls_section_begin|> as an immediate end to the reasoning phase, allowing seamless transition to tool calls without requiring an explicit </think> token.
  • Updated Model Mapping: The kimi_k2 model is now correctly mapped to the new KimiK2Detector within the ReasoningParser.
  • Simplified MoE Backend Logic: Removed specific detection and auto-selection logic for Kimi K2/K2.5 int4 compressed-tensors related to the flashinfer_trtllm MoE backend, as it is no longer necessary.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • python/sglang/srt/parser/reasoning_parser.py
    • Added KimiK2Detector class, inheriting from BaseReasoningFormatDetector, to support Kimi K2 models with tool interruption.
    • Updated the ReasoningParser's _detector_map to use KimiK2Detector for the "kimi_k2" model.
  • python/sglang/srt/server_args.py
    • Removed the is_kimi_k2_k25_thinking_int4 variable and its associated logic for detecting specific Kimi K2/K2.5 int4 compressed-tensors.
    • Modified the MoE runner backend auto-selection to no longer consider is_kimi_k2_k25_thinking_int4 for flashinfer_trtllm.
  • test/registered/parser/test_reasoning_parser.py
    • Imported KimiK2Detector.
    • Added TestKimiK2Detector class with unit tests for initialization, non-streaming parsing with tool interruption, streaming parsing with tool interruption, and handling normal text after interruption.
    • Extended test_init_valid_model to include kimi_k2 mapping to KimiK2Detector.
    • Added test_kimik2_tool_interruption to TestReasoningParser for integration testing of Kimi K2 tool interruption.
Activity
  • Unit tests were added for KimiK2Detector initialization, non-streaming parsing with tool interruption, streaming parsing with tool interruption, and post-interruption normal text handling.
  • Integration tests were added through the ReasoningParser API for both streaming and non-streaming Kimi K2 tool interruption scenarios.
  • All existing and new unit tests for test_reasoning_parser.py passed successfully.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a dedicated KimiK2Detector to handle tool interruptions for Kimi K2 models, which is a great addition. The implementation is clean, and the related cleanup in server_args.py improves code clarity. The accompanying unit and integration tests are thorough and cover various scenarios, ensuring the new detector works as expected. I have one suggestion to refactor some duplicated test logic to improve maintainability.

Comment thread test/registered/parser/test_reasoning_parser.py
- Implemented KimiK2Detector class to handle reasoning format with tool-call sections.
- Updated ReasoningParser to include KimiK2Detector.
- Added unit tests for KimiK2Detector covering initialization, tool interruption detection, and streaming parsing scenarios.

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
…oE class for abstract method implementation

Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
@JustinTong0323 JustinTong0323 force-pushed the fix/kimi-k25-implicit-tool-end-token-end-thinking branch from 2c1c342 to 1e252cb Compare March 2, 2026 14:29
@JustinTong0323
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

Copy link
Copy Markdown
Collaborator

@kpham-sgl kpham-sgl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ispobock ispobock merged commit dbf1247 into sgl-project:main Mar 3, 2026
181 of 204 checks passed
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Mar 3, 2026
Summarizes the probabilistic tool call parsing failure when Kimi-K2.5
skips </think> token and directly emits <|tool_calls_section_begin|>.
Includes root cause analysis, SGLang fix reference (PR sgl-project#19696), and
questions for the Kimi team regarding expected token generation behavior.

References: sgl-project#18086

https://claude.ai/code/session_01GnsZFpmtACr5U3Wkc7MZ2Z
AMD-yanfeiwang pushed a commit to AMD-yanfeiwang/sglang that referenced this pull request Mar 3, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
AMD-yanfeiwang pushed a commit to AMD-yanfeiwang/sglang that referenced this pull request Mar 3, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Kangyan-Zhou pushed a commit to Kangyan-Zhou/sglang that referenced this pull request Mar 4, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
magicYang1573 pushed a commit to magicYang1573/sglang that referenced this pull request Mar 9, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
JustinTong0323 added a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Signed-off-by: Xinyuan Tong <xinyuantong.cs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants